On Multi-Scale Piecewise Stationary Spectral Analysis of Speech Signals for Robust ASR
نویسندگان
چکیده
A £xed scale (typically 25ms) short time spectral analysis of speech signals, which are inherently multi-scale in nature [7] (typically vowels last for 40-80ms while stops last for 10-20ms), is clearly sub-optimal for time-frequency resolution. In this work, we detect piecewise quasi-stationary speech segments based on the likelihood of that segment which in turn is estimated from the linear prediction (LP) residual error. A window size equal in length to that of the detected quasistationary segment is used to obtain its spectral estimate. Such an approach adaptively chooses the largest possible window size such that the signal remains quasistationary within this window and excludes the adjoining quasi-stationary segments from this window. In experiments, it is shown that the proposed multi-scale piecewise stationary spectral analysis based features improve recognition accuracy in clean conditions when compared directly to features based on £xed scale spectral analysis.
منابع مشابه
On variable-scale piecewise stationary spectral analysis of speech signals for ASR
A fixed scale (typically 25ms) short time spectral analysis of speech signals, which are inherently multi-scale in nature (typically vowels last for 40-80ms while stops last for 10-20ms), is clearly sub-optimal for time-frequency resolution. Based on the usual assumption that the speech signal can be modeled by a time-varying autoregressive (AR) Gaussian process, we estimate the largest piecewi...
متن کاملOn Multi-scale Fourier Transform Analysis of Speech Signals
In this paper, we introduce a novel algorithm to perform multi-scale Fourier transform analysis of piecewise stationary signals with application to automatic speech recognition. Such signals are composed of quasi-stationary segments of variable lengths. Therefore, in the proposed algorithm, signals are analyzed with multiple-sized windows. Resulting power spectra are then normalized such that t...
متن کاملAdaptive Enhancement of Speech Signals for Robust ASR
Behavior of the least squares filter (LeSF) is analyzed for a class of non-stationary signals that are composed of multiple sinusoids whose frequencies and the amplitudes may vary from block to block and which are embedded in white noise. Analytic expressions for the weights and the output of the LeSF are derived as a function of the block length and the signal SNR computed over the correspondi...
متن کاملLeast Squares Filtering of Speech Signals for Robust ASR
The behavior of the least squares filter (LeSF) is analyzed for a class of nonstationary signals that are either (a) composed of multiple sinusoids (voiced speech) whose frequencies, phases and the amplitudes may vary from block to block or, (b) are output of an all-pole filter excited by white noise input (unvoiced speech segments) and which are embedded in white noise. In this work, analytic ...
متن کاملMulti-Channel Estimation of the Power Spectral Density of Noise for Mixtures of Non-Stationary Signals
The proposed paper deals with the estimation of the power spectral density (PSD) of noise for mixtures of non-stationary wideband signals exploiting sparseness in the time-frequency domain. The proposed method is applied to realize a beamformer-plus-postfilter structure for noise-robust speech recognition. Experiments with a small-scale 4-sensor microphones array show that interfering speech an...
متن کامل